-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing: properly block on shard building #689
Conversation
I noticed this when I took a memory profile with
After, this is consistently only 700MB:
So this seems to fix an important issue. I need to look further into why this is still not closer to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!! So now we will have at most parallelism + 1 shards in memory right? Since you can have parallelism documents having buildShard called on, and then 1 full todo slice trying to have flush called on it?
I double checked calls to flush and th euse of the b.building waitgroup. I don't see any issues with potential deadlocks/etc. LGTM!
Indeed now it will be at most |
@keegancsmith @stefanhengl general note about the indexing memory fixes: I plan to let these "bake" for ~2 weeks on S2 / dot com before backporting this to a 5.2 patch. I'm being pretty conservative since it's very core code and this logic hasn't been touched in a while. |
When indexing, we build shards in parallel based on the `parallelism` flag. Each shard handles ~100MB of document contents, which should limit the memory usage to roughly `100MB * parallelism`. Looking at the size of the buffered document contents in memory profiles, we see much higher usage than this. The issue seems to be that we continue to buffer up documents even if all threads are busy building shards. This can be a real problem if shards take a super long time to build (say because ctags is slow) -- we could end up buffering a ton of content into memory at once. This change fixes the throttling logic so we block indexing when all threads are busy building shards.
When indexing, we build shards in parallel based on the
parallelism
flag.Each shard handles ~100MB of document contents, which should limit the memory
usage to roughly
100MB * parallelism
.Looking at the size of the buffered document contents in memory profiles, we
see much higher usage than this. The issue seems to be that we continue to
buffer up documents even if all threads are busy building shards. This can be a
real problem if shards take a super long time to build (say because ctags is
slow) -- we could end up buffering a ton of content into memory at once.
This change fixes the throttling logic so we block indexing when all threads
are busy building shards.